33 research outputs found
Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm
Information spreads across social and technological networks, but often the
network structures are hidden from us and we only observe the traces left by
the diffusion processes, called cascades. Can we recover the hidden network
structures from these observed cascades? What kind of cascades and how many
cascades do we need? Are there some network structures which are more difficult
than others to recover? Can we design efficient inference algorithms with
provable guarantees?
Despite the increasing availability of cascade data and methods for inferring
networks from these data, a thorough theoretical understanding of the above
questions remains largely unexplored in the literature. In this paper, we
investigate the network structure inference problem for a general family of
continuous-time diffusion models using an -regularized likelihood
maximization framework. We show that, as long as the cascade sampling process
satisfies a natural incoherence condition, our framework can recover the
correct network structure with high probability if we observe
cascades, where is the maximum number of parents of a node and is the
total number of nodes. Moreover, we develop a simple and efficient
soft-thresholding inference algorithm, which we use to illustrate the
consequences of our theoretical results, and show that our framework
outperforms other alternatives in practice.Comment: To appear in the 31st International Conference on Machine Learning
(ICML), 201
A glance into the Pathology of Covid-19, Its Current and Possible Treatments; Interleukin Antagonists as an Effective Option; A review.
The outbreak of the novel SARS-COV-2 and its following complications has caused an almost unprecedented chaos throughout the world in recent years. Although a series of vaccines have been proposed recently in order to reduce the risk of mortality and morbidity of this disease, an ultimate and reliable cure has yet to be discovered. One of the major complications of Covid-19 is the outburst of a series of inflammatory responses in the respiratory system of the patients, which eventually causes a hypoxemic pneumonitis and accounts for most of the Covid-19 patientsâ mortality. It is suggested that a group of inflammatory cytokines such as different interleukins are responsible for this complication, therefore drugs which can influence this system may be useful in reducing this exaggerated inflammatory response which is dubbed the âcytokine stormâ. In this article we review potential treatment options for reducing the inflammatory response and discuss some clinical trials and case reports related to the drugs interfering with responsible interleukins in order to quench the cytokine storm.
 
On the impact of activation and normalization in obtaining isometric embeddings at initialization
In this paper, we explore the structure of the penultimate Gram matrix in
deep neural networks, which contains the pairwise inner products of outputs
corresponding to a batch of inputs. In several architectures it has been
observed that this Gram matrix becomes degenerate with depth at initialization,
which dramatically slows training. Normalization layers, such as batch or layer
normalization, play a pivotal role in preventing the rank collapse issue.
Despite promising advances, the existing theoretical results (i) do not extend
to layer normalization, which is widely used in transformers, (ii) can not
characterize the bias of normalization quantitatively at finite depth.
To bridge this gap, we provide a proof that layer normalization, in
conjunction with activation layers, biases the Gram matrix of a multilayer
perceptron towards isometry at an exponential rate with depth at
initialization. We quantify this rate using the Hermite expansion of the
activation function, highlighting the importance of higher order ()
Hermite coefficients in the bias towards isometry
Batch Normalization Orthogonalizes Representations in Deep Random Networks
This paper underlines a subtle property of batch-normalization (BN):
Successive batch normalizations with random linear transformations make hidden
representations increasingly orthogonal across layers of a deep neural network.
We establish a non-asymptotic characterization of the interplay between depth,
width, and the orthogonality of deep representations. More precisely, under a
mild assumption, we prove that the deviation of the representations from
orthogonality rapidly decays with depth up to a term inversely proportional to
the network width. This result has two main implications: 1) Theoretically, as
the depth grows, the distribution of the representation -- after the linear
layers -- contracts to a Wasserstein-2 ball around an isotropic Gaussian
distribution. Furthermore, the radius of this Wasserstein ball shrinks with the
width of the network. 2) In practice, the orthogonality of the representations
directly influences the performance of stochastic gradient descent (SGD). When
representations are initially aligned, we observe SGD wastes many iterations to
orthogonalize representations before the classification. Nevertheless, we
experimentally show that starting optimization from orthogonal representations
is sufficient to accelerate SGD, with no need for BN
Transformers learn to implement preconditioned gradient descent for in-context learning
Motivated by the striking ability of transformers for in-context learning,
several works demonstrate that transformers can implement algorithms like
gradient descent. By a careful construction of weights, these works show that
multiple layers of transformers are expressive enough to simulate gradient
descent iterations. Going beyond the question of expressivity, we ask: Can
transformers learn to implement such algorithms by training over random problem
instances? To our knowledge, we make the first theoretical progress toward this
question via analysis of the loss landscape for linear transformers trained
over random instances of linear regression. For a single attention layer, we
prove the global minimum of the training objective implements a single
iteration of preconditioned gradient descent. Notably, the preconditioning
matrix not only adapts to the input distribution but also to the variance
induced by data inadequacy. For a transformer with attention layers, we
prove certain critical points of the training objective implement
iterations of preconditioned gradient descent. Our results call for future
theoretical studies on learning algorithms by training transformers
Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks
Randomly initialized neural networks are known to become harder to train with
increasing depth, unless architectural enhancements like residual connections
and batch normalization are used. We here investigate this phenomenon by
revisiting the connection between random initialization in deep networks and
spectral instabilities in products of random matrices. Given the rich
literature on random matrices, it is not surprising to find that the rank of
the intermediate representations in unnormalized networks collapses quickly
with depth. In this work we highlight the fact that batch normalization is an
effective strategy to avoid rank collapse for both linear and ReLU networks.
Leveraging tools from Markov chain theory, we derive a meaningful lower rank
bound in deep linear networks. Empirically, we also demonstrate that this rank
robustness generalizes to ReLU nets. Finally, we conduct an extensive set of
experiments on real-world data sets, which confirm that rank stability is
indeed a crucial condition for training modern-day deep neural architectures